20 research outputs found

    Turkish Discourse Bank: Porting a discourse annotation style to a morphologically rich language

    Get PDF
    This paper briefly describes the Turkish Discourse Bank, the first publicly available annotated discourse resource for Turkish. It focuses on the challenges posed by annotating Turkish, a free word order language with rich inflectional and derivational morphology. It shows the usefulness of the PDTB style annotation but points out the need to expand this annotation style with the needs of the target language

    The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations

    Get PDF
    In this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, defining which parts of the scheme are an extension of the PDTB and which parts are different. We provide inter-coder reliability calculations on the first and second arguments of some connectives and discuss the most important sources of disagreement among annotators

    The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations

    Get PDF
    In this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, defining which parts of the scheme are an extension of the PDTB and which parts are different. We provide inter-coder reliability calculations on the first and second arguments of some connectives and discuss the most important sources of disagreement among annotators

    Annotating Subordinators in the Turkish Discourse Bank

    Get PDF
    In this paper we explain how we annotated subordinators in the Turkish Discourse Bank (TDB), an effort that started in 2007 and is still continuing. We introduce the project and describe some of the issues that were important in annotating three subordinators, namely kars¸ın, ragmen ˘ and halde, all of which encode the coherence relation Contrast-Concession. We also describe the annotation tool

    Türkçe’nin söylem yapısı.

    No full text
    This thesis investigates the structure of immediate discourse in Turkish. The first and fore- most question is how discourse is built. Are there components of discourse that constitute a predicate-argument structure, or is discourse realized by underlying non-structural ties that are merely made explicit by these components? If there is structure in discourse, what is the nature of this structure, and what is its complexity? For this purpose, we analyze the relations annotated in the Turkish Discourse Bank, and their counterparts annotated on the Spoken Turkish Corpus Demo specifically for this study. Through close examination of inter-relational configurations identified in these corpora, we investigate deviations from tree-structure and attempt at eliminating the deviations without compromising the meaning of the text. We show that while some of these deviations can be explained away, some of them stem from the nature of discourse as well as syntactic asymmetries of the components of the discourse relations, and should be accommodated by the discourse theory. Building upon our findings from the data, we discuss what role discourse connectives play in building the discourse structure. We argue that although discourse relations are best repre- sented as logical predicates, they are fundamentally different from sentence-level predicates. Our conclusion is that the discourse relations anchored by explicit discourse connectives and the inferences represented by implicit discourse connectives are a representation of the struc- ture we perceive in the text, as opposed to sentence-level predicates that build an argument structure and impose linguistic restrictions on their arguments.Ph.D. - Doctoral Progra

    Türkçe yazılı metinlerde söylem bağlaçlarının bağlaç konumu, öğe dizilimi ve bilgi yapısı.

    No full text
    A text is a linguistic structure that is more than a random collection of sentences. A text is cohesive (Halliday & Hasan, 1976) and coherent (Mann & Thompson, 1987, 1988). Mainly ignored in the field of linguistics until recently, the text and the discourse structure have been inquired from various points of view (Asher, 1993; Asher & Lascarides, 1998; Grosz & Sidner, 1986; Mann & Thompson, 1987, 1988; Webber, 2004). D-LTAG is a discourse grammar work that extends a lexicalized sentence level grammar LTAG (Joshi, 1987) to low-level discourse (Webber, 2004; Webber & Joshi, 1998). In this framework, discourse connectives such as coordinating conjunctions, subordinating conjunctions, parallel connectives and discourse adverbials are predicates of discourse structure that take text spans that can be interpreted as abstract objects (Asher, 1993). Turkish has a flexible word order in comparison to languages like English. In English, the discourse adverbials are noted for their ability to occupy positions unavailable to other discourse connectives. In Turkish, word order of other discourse connectives, coordinators and subordinators are not expected to be as restricted. This thesis examines the connective position, argument order and the information structure of five Turkish discourse connectives in their eleven uses. The analyses show that the examined features of discourse connectives are related to the syntactic group the connective belongs to. Discourse connectives of the same syntactic groups exploit similar connective position and argument order possibilities, and they tend to be included in similar information units.M.S. - Master of Scienc

    Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank

    Get PDF
    In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼∼400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has offered as well as its potential drawbacks

    Pair Annotation: adaption of pair programming to corpus

    Get PDF
    This paper will introduce a procedure that we call pair annotation after pair programming. We describe initial annotation procedure of the TDB, followed by the inception of the pair annotation idea and how it came to be used in the Turkish Discourse Bank. We discuss the observed benefits and issues encountered during the process, and conclude by discussing the major benefit of pair annotation, namely higher inter-annotator agreement values
    corecore